Libraries imported successfully. Environment configured for regression analysis.
Local file not found. Retrieving from GitHub repository... Data retrieved and cached locally. Sample size: 6,319 listings Variables: 13 columns Price range: £10.00 - £74100.00
Dataset Overview: Observations: 6,319 Variables: 13 Sample preview (first 5 observations):
| price | accommodates | bedrooms | beds | room_type | property_type | latitude | longitude | availability_365 | minimum_nights | maximum_nights | number_of_reviews | bathrooms | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 126.0 | 4 | 1.0 | 1.0 | Entire home/apt | Entire rental unit | 51.514609 | -0.136069 | 39 | 1 | 365 | 15 | 1.0 |
| 1 | 225.0 | 5 | 3.0 | 3.0 | Entire home/apt | Entire home | 51.398840 | -0.290510 | 315 | 2 | 70 | 21 | 1.5 |
| 2 | 2400.0 | 8 | 4.0 | 4.0 | Entire home/apt | Entire rental unit | 51.500550 | -0.017170 | 364 | 1 | 1125 | 0 | 3.0 |
| 3 | 150.0 | 4 | 2.0 | 2.0 | Entire home/apt | Entire condo | 51.506070 | -0.218960 | 273 | 2 | 120 | 38 | 2.0 |
| 4 | 180.0 | 6 | 4.0 | 2.0 | Entire home/apt | Entire home | 51.441898 | -0.195032 | 353 | 5 | 365 | 1 | 2.5 |
Note: 64 listings above £1214 excluded from chart for clarity
Average price: £220.48 per night Median price: £135.00 Cheapest listing: £10.00 Most expensive listing: £74100.00
Note: 64 extreme outliers (>1214£) excluded for clarity
Note: Showing 99% of data (prices ≤ £1214)
Average prices by room type: Hotel room: £549.11/night Entire home/apt: £281.90/night Private room: £85.99/night Shared room: £38.84/night
Note: 64 extreme price outliers excluded for clarity
Note: 62 extreme price outliers excluded for clarity
Correlation Interpretation: r > 0.7: Strong positive association r > 0.3: Moderate positive association r < -0.3: Moderate negative association |r| < 0.3: Weak or no linear relationship
Geographic Insights: Central London (higher density) shows elevated pricing Price gradient visible from city center to periphery Yellow/light colors = Higher priced listings Purple/dark colors = Lower priced listings
Availability Statistics: Mean availability: 217 days/year Median availability: 248 days/year Fully available (365 days): 349 listings (5.5%) Not available (0 days): 81 listings (1.3%) Business Insight: Bimodal distribution suggests full-time vs. occasional hosting strategies
Minimum Nights Statistics: Median minimum nights: 2 1-night stays allowed: 2,336 listings (37.0%) Weekly minimum (7+ nights): 762 listings (12.1%) Business Insight: Longer minimum stays often correlate with lower nightly rates (volume pricing strategy)
Outlier Detection (IQR Method): Q1 (25th percentile): £77.00 Q3 (75th percentile): £223.50 IQR: £146.50 Lower bound: £-142.75 Upper bound: £443.25 Outliers detected: 457 listings (7.2%) Price range of outliers: £444.00 - £74100.00
Missing Values Summary:
Column Missing_Count Missing_Percent
bathrooms 69 1.09
bedrooms 13 0.21
beds 13 0.21
Duplicates removed. Observations after deduplication: 6,319
Duplicate rows found: 0 Total rows before check: 6,319
Logarithmic transformation applied. Original price range: £10.00 - £74100.00 Transformed range: 2.40 - 11.21
Selected 3 continuous predictors: - accommodates - bedrooms - beds
Added 3 room type variables Total features for model: 6
Median imputation applied: - bedrooms: 13 values imputed - beds: 13 values imputed
Analysis dataset prepared. Observations: 6,319 Predictors: 6 Data types validation: accommodates int64 bedrooms float64 beds float64 room_Hotel room bool room_Private room bool room_Shared room bool log_price float64 dtype: object First observations: accommodates bedrooms beds room_Hotel room room_Private room \ 0 4 1.0 1.0 False False 1 5 3.0 3.0 False False 2 8 4.0 4.0 False False 3 4 2.0 2.0 False False 4 6 4.0 2.0 False False room_Shared room log_price 0 False 4.844187 1 False 5.420535 2 False 7.783641 3 False 5.017280 4 False 5.198497
Training data prepared. Training set: 6,319 observations (100% of data) Data types in X_train: accommodates int64 bedrooms float64 beds float64 room_Hotel room bool room_Private room bool room_Shared room bool dtype: object Note: Model trained on entire dataset for maximum sample size.
BASELINE MODEL Results: Predictors: accommodates, bedrooms R² (Coefficient of Determination): 0.3511 Adjusted R²: 0.3509 Effect size: Medium (Cohen's f² = 0.541) RMSE (Root Mean Squared Error): 0.6474 Adjusted for predictors: 35.1% Model Performance: Explained variance: 35.1%
FULL MODEL Results: Number of predictors: 6 R² (Coefficient of Determination): 0.5004 Adjusted R²: 0.4999 RMSE (Root Mean Squared Error): 0.5681 Model Performance: Explained variance: 50.0% Adjusted for predictors: 50.0% Effect size: Large (Cohen's f² = 1.002) Model Comparison: Incremental variance explained: 14.9% ΔR² = 0.1493 Additional predictors provide improvement (ΔR² > 0).
Data prepared: 6319 observations, 6 features
Data types: [dtype('float64')]
================================================================================
STATSMODELS OLS REGRESSION RESULTS
================================================================================
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.500
Model: OLS Adj. R-squared: 0.500
Method: Least Squares F-statistic: 1054.
Date: Sun, 30 Nov 2025 Prob (F-statistic): 0.00
Time: 18:38:59 Log-Likelihood: -5392.9
No. Observations: 6319 AIC: 1.080e+04
Df Residuals: 6312 BIC: 1.085e+04
Df Model: 6
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
const 4.6250 0.018 252.981 0.000 4.589 4.661
accommodates 0.1122 0.007 16.752 0.000 0.099 0.125
bedrooms 0.1644 0.012 13.828 0.000 0.141 0.188
beds -0.0523 0.009 -5.857 0.000 -0.070 -0.035
room_Hotel room 0.8721 0.190 4.595 0.000 0.500 1.244
room_Private room -0.7310 0.018 -40.994 0.000 -0.766 -0.696
room_Shared room -1.3692 0.134 -10.244 0.000 -1.631 -1.107
==============================================================================
Omnibus: 2367.847 Durbin-Watson: 2.009
Prob(Omnibus): 0.000 Jarque-Bera (JB): 16823.851
Skew: 1.618 Prob(JB): 0.00
Kurtosis: 10.310 Cond. No. 134.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
================================================================================
BUSINESS-FRIENDLY INTERPRETATION OF KEY STATISTICS
================================================================================
TOP 5 PREDICTORS BY ABSOLUTE COEFFICIENT MAGNITUDE: ================================================== room_Shared room β = -1.3692 (negative association) room_Hotel room β = 0.8721 (positive association) room_Private room β = -0.7310 (negative association) bedrooms β = 0.1644 (positive association) accommodates β = 0.1122 (positive association)
Green bars = positive coefficients Red bars = negative coefficients
Variance Inflation Factor (VIF) Analysis:
==================================================
Feature VIF
accommodates 11.604452
bedrooms 9.018756
beds 8.733639
room_Private room 1.140507
room_Shared room 1.046276
room_Hotel room 1.000697
Interpretation:
VIF < 5: Low multicollinearity (acceptable)
VIF 5-10: Moderate multicollinearity (caution)
VIF > 10: High multicollinearity (problematic)
(!) WARNING: 1 predictor(s) exhibit high multicollinearity.
Observations proximate to diagonal indicate accurate predictions. Deviation from diagonal represents prediction error.
Mean residual: 0.0000 Std deviation: 0.5681
============================================================ MODEL FIT STATISTICS ============================================================ Baseline Model (k=2): R² = 0.3511 Adjusted R² = 0.3509 RMSE = 0.6474 Full Model (k=6): R² = 0.5004 Adjusted R² = 0.4999 RMSE = 0.5681 Model Comparison: ΔR² = 0.1493 ΔAdjusted R² = 0.1490 Conclusion: Full model justified: Adjusted R² improvement = 14.90%
Average error: 0.0000 Typical error size: 0.4100 Good residuals should be randomly scattered around zero!
Prediction Accuracy Summary: Exact category match: 55.8% Within one category: 97.8% Diagonal = correct predictions (darker = more accurate) Off-diagonal = misclassifications